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Abstract 

In this paper we give a simple new proof of a result of Pittel and 
Wormald concerning the asymptotic value and (suitably rescaled) limiting 
distribution of the number of vertices in the giant component of G(n,p) 
above the scaling window of the phase transition. Nachmias and Peres 
used martingale arguments to study Karp's exploration process, obtaining 
a simple proof of a weak form of this result. We use slightly different 
martingale arguments to obtain a much sharper result with little extra 
work. 

1 Introduction and results 

The component of a random graph containing a given vertex may be 'explored' 
by a step-by-step process that is by now well known, described in detail below. 
A key feature of this process is that vertices are 'examined' one at a time, 
and tested for edges to 'new' vertices. This means that the behaviour of the 
exploration is closely connected to that of a certain random walk. In the context 
of random graphs, this process was introduced by Karp [3] in 1990; slightly 
earlier, Martin-L6f [5] used essentially the same process in a different context, 
namely the study of epidemics, where it arises even more naturally. Somewhat 
later, Aldous [T] introduced a variant of the process adapted to explore all 
components of a random graph; recently, analyzing this latter exploration with 
martingale techniques related to those in [5], Nachmias and Peres [B] gave a 
simple proof that in the weakly supercritical range, i.e., when p = (1 + s)/n 
where e = e(n) satisfies e — > but e 3 n —> oo, the largest component of G(n,p) 
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contains 2en + o p (en) vertices. (They also studied the weakly subcritical case, 
which we shall not discuss further here.) 

Here we shall analyze the same process more carefully, obtaining a sim- 
ple new proof of the following asymptotic normality result due to Pittel and 
Wormald [5] . Let p = p\ denote the survival probability of the Galton- Watson 
branching process in which the number of offspring of each individual has a 
Poisson distribution with mean A. For A > 1 we may write p\ as the unique 
positive solution to 

1 - P = e- Xp . (1) 

When A > 1 we write A» for A(l — p\); this is often known as the dual branching 
process parameter to A, and satisfies A» < 1 and A*e -A * = Ae~ A . (The corre- 
sponding Poisson branching process provides an approximation of the random 
graph in the vicinity of a generic vertex outside the giant component.) 

Theorem 1. Let p = X/n where A = A(n) satisfies A = 0(1) and (A— l) 3 n — > oo 
as n — > oo, and let L\ denote the number of vertices in the largest component 
ofG(n,p). Then 

a 

where A denotes convergence in distribution, N(0, 1) is the standard normal 
distribution, p = p\ > is defined by |T]), and 

2 PQ-P) 

The special case of this result in which A is constant goes back to Stepanov [5] 
(see also Pittel |7] ) ; the form above is due to Pittel and Wormald [8 , who proved 
much more, including asymptotic joint normality of the sizes of the largest 
component and of its 2-core. 

Specializing to the barely supercritical case, the formulae above simplify 
considerably. Indeed, it is easy to check that if A = 1 + e and e — > 0, then 
p\ = 2e + 0(e 2 ), and A* = 1 — e + 0(e 2 ). Thus Theorem Q] has the following 
corollary. 

Corollary 2. Let e = e(n) satisfy e — > and e 3 n — > oo, and let L\ denote the 
number of vertices in the largest component of G(n, (1 + e)/n). Then 



L\- pn d 



V2iF T n 



N(0,1), (2) 



where p > is defined by {T]) with A = 1 + e. □ 

Under the conditions of Corollary [2] we have p ~ 2e, while the standard 
deviation y f 2e~ 1 n is o(en), so Corollary [2] implies in particular the result of 
Nachmias and Peres [6] mentioned earlier. 



2 



2 The proof 



We consider the component exploration process as in [6] , itself based on those of 
Karp [4: , Martin-L6f [5] and Aldous [I] , although we shall use slightly different 
terminology and initial conditions. At each step, every vertex will have one of 
three states, active, explored, or unseen. The exploration will take place in n 
steps, at times t = 1, . . . , n, starting from the initial state at time 0, when every 
vertex is unseen. 

Fix an order on the vertices. At step 1 < t < n (i.e., going from time t—1 
to time t) let vt be the first active vertex, if there are any; otherwise Vt is the 
first unseen vertex. In the latter case we say that we 'start a new component' 
at step t. Having defined v t , reveal all edges from v t to (other) unseen vertices; 
let rjt be the number of such edges, and label the corresponding neighbours of 
Vt as active; label Vt itself as explored. After t steps of the process, exactly t 
vertices have been explored. We write A t and Ut for the numbers of active and 
unseen vertices after < t < n steps, so Ut — n — t — A t , Aq = and Uq = n. 

After n steps, it is very easy to see that the process has revealed a spanning 
forest in G, having first revealed a spanning tree of one component, then a 
spanning tree of another component (if there is more than one), and so on. 

Write C t for the number of components started by time t, and set X t = 
A t — Ct- We claim that 

t 

X t = A t - C t = YJjh - 1). (3) 
t=i 

Indeed, if in step t we do not start a new component, then we explore an active 
vertex and then change rj t vertices from unseen to active, so A t — A t -i = r) t — 1 
and Ct — Ct-i- If we do start a new component, which happens if and only 
if A t -i = 0, then we explore an unseen vertex, so A t — At-\ — A t = rj t and 
C t -C t -i = l. This establishes ©• 

Let = to < ti < fa < ■ ■ ■ < tk = n enumerate {t : A t — 0}, i.e., the 
set of times at which there are no active vertices. We start exploring the ith 
component at time £j_i + 1 and finish at time ti, so 

L\ = maxjii — £j_x : 1 < i < k}. (4) 

Since Ct = i for t^i < t < ti, recalling that X t — A t — Ct we have 

U = inf{< : X t = -i}. (5) 

Writing c(G) for the number of components of G — G(n,p), note that X n — 
—c(G), and that X t may decrease by at most one at each step, so the infimum 
is defined for all 1 < i < c(G). 

Let Tt denote the sigma-field generated by r\\, . . . , rj t ; in other words, Tt is 
the (finite, of course) sigma-field generated by all information revealed by step 
t. Set U' t — U t if A t > and U[ = U t - 1 otherwise. Then U[ is the number 
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of edges tested at step t + 1. Hence, given T t , the random variable rj t +i has a 
binomial distribution with parameters U[ and p: 

n Vt+1 = k\F t ) = (^\ k (i-pf*- k . 

If we know the sequence (ijt), then we know the entire outcome of the process, 
and in particular L\. More precisely, we can use © to find (X t ), then J5J to 
find the U (and thus (C t ), (A t ) and (Ut)), and finally (|4]) gives us ii. 

So far we have been following (with minor modifications) the definitions and 
initial analysis in [BJ. But now our analysis takes a different route. 

Let us write D t for the expectation of r\t — 1 given J-t-i, noting that D t is 
random, and satisfies 

D t+1 =E(» Jt+1 -l|.F t )=paZ-l. 

Recalling that Ut = n — t — A t = n — t — Xt — Ct, and noting that U[ — 
U t - {C t+ i - C t ), this gives 

D t+1 = p(n - t - X t - C t+1 ) - I. (6) 

Our next aim is to approximate the process (Xt) that we wish to study by 
a simpler process (X t ), consisting of a deterministic term plus a term closely 
related to a martingale. Let A t = rjt — 1 — D t , so E(A t | Ft-x) = by the 
definition of D t . From (J3j) , ([6J and r] t +i — 1 = Df+i + A t+ i we obtain the 
recurrence 

X t+1 = (1 - p)X t + A t+1 +p („ _ *) _ l _ pCt+i. (7) 

Let 

Xt = n — t — n(l — p) , 

so a;o = and 

Xt+l = (1-^)^+^(71-*)- 1. (8) 
Subtracting © form we see that 

^t+i - £t+i = (1 - p)(X t - x t ) + A t+ i - pC t+ i, 

whence 

t 

Xi-x* = ^(l-p) t - i (A i -pCi). (9) 

i=l 

With this in mind, wc define our approximating process by 

t 

X t ^x t +Y / ( 1 -P) t ' 1 ^- (10) 

i=i 
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Lemma 3. For any p > and any 1 < t < n we have 

\X t - X t \ < ptCt- 
Proof. From and (|TU)) we have 

t 

i=l 

The result follows immediately since there are t terms in the sum, each bounded 
by pC t . □ 

Let 

t 

i=i 

so (St) is a martingale, and 

X t = xt + {l-pfSt. (11) 

As we shall see below, it is easy to obtain very precise results about the dis- 
tribution of (X t ); before turning to the details, let us indicate in rather vague 
terms why this should be the case. 

The variance of each A; is O(l), so St and hence (1— p fSt have variance 0(t) 
and size O p (y/i). It is true that the distribution of A t depends on earlier values 
of Xi in a way that is hard to evaluate exactly, but the dependence is weak: the 
conditional variance of A t is simply p(l — p)Uj._ 1 , so if we can bound the earlier 
Xi within an additive error of o(n), then we obtain a bound on the variance of 
At accurate to within a factor 1 + o(l). This gives only a o p (y/t) additive error 
in the martingale term, which is negligible compared to the random variation. 
(It will turn out that we hit the giant component before seeing many other 
components, so additional ptCt error from Lemma [3] will be negligible.) This 
strongly suggests that given that Theorem [1] is true, there should be a simple 
proof based on the analysis of (X t ). As we shall see, this is indeed the case. 

From now on we assume that p = A/n, where A = X(n) > 1 is bounded. More 
explicitly, we assume A < M for some constant M . Often, we write A = 1 + e; 
we assume also that e 3 n — > oo. 

For the moment, we study (X t ). Let us first start with a standard observa- 
tion; the second part is a special case of Doob's maximal inequality [3l Ch. Ill, 
Theorem 2.1]. 

Lemma 4. Let (^t)§° be a discrete-time martingale with filtration (Tt) and 
mean Zq = 0. Write It for the increment Z t — Z t -\- Then 

t t 

Var(Zt) =^Var(7i) = ^E(Var(7i | (12) 

i=l i=l 

and for any M > 0, 

P(max|Zi| > M) < Msx(Z t )/M 2 . 

i<t 
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Proof. For the first statement, observe that Elj = for all i and EZ t = 0, while 
for i < j we have £(1^-) = E{E(IJ 3 | = E(0) = 0. Hence Var(Z f ) = 

EZ t 2 = E(£*7,) 2 = EiE/| = EiVar(7i). Also, E(Var(Ii | = E(E(7? | 

Fi-x)) = E7 2 , proving ([12]). 

For the second statement, apply Doob's maximal inequality. Alternatively, 
simply modify the martingale if \Zi\ > M holds for any i: let T be the (random) 
first such i, or T = t if there is no such i, and set Zj = Zj for j < T and 
Zj = Zy for j > T. Since T is a stopping time, the conditional distribution 
of 1[ = Z[ — Z' i _ l given J-'i-i is either the same as that of 7j, or zero, so the 
conditional variances of the I[ are at most those of the 7$. Hence, by (TT^|) , 
Var(Z t ') < Var(Z t ). Since max^t |Z t | > M if and only if |Z t '| > M, applying 
Chebyshev's inequality gives the result. □ 

Let us write CBi(m, p) for the centered binomial distribution obtained by sub- 
tracting mp from a random variable with binomial distribution Bi(m,p). Note 
that the variance of this distribution is mp(l — p). The conditional distribution 
of At given Tt-\ is exactly that of a centered binomial CBi([/ t ' _ 1 ,p). (Previ- 
ously, we first subtracted one, and then centered, but of course this is the same 
as centering directly.) It follows that the differences Ii = Si — Si-i = (1— p)~ 4 A^ 
satisfy 

Var(7 i | = (1 -p)- 2i ^_ lP (l -p), (13) 

so 

Var(7i | Fi-x) < (l- P y 2n Ji P < (1 - M/ n y 2n M = 0(1). 
For any (deterministic) function t = t(n), Lemma [4] thus gives 

sup|Si| = O p (Vt). (14) 

Let /(f) = f n (t) = n — t — ne~ pt be the continuous-time form of the idealized 
trajectory of (X t ) (and hence of (X t )). It is easy to check that \f(t) — x t \ = 0(1), 
uniformly in p < M/n and < t < n; our next lemma shows that (A t ) remains 
close to f n (t). 

Lemma 5. For any 1 < t = t(n) < n we have 

swp\X t -f n (t)\=O p (Vi). 

i<t 

Proof. Immediate from tfl4]>. (jTTjl and \f n (t) - x t \ = 0(1). □ 

Together, Lemmas |3] and [5] show that (A t ) remains close to the idealized 
trajectory fit), as long as Ct is not too large. As in [6], the basic idea is now to 
consider the solution t\ = pn to /(ii) = 0, and choose a suitable to. We shall 
show that in the interval [to,ti — to] the function f(t) is far enough away from 
zero that X t remains positive, so no new component is started in this interval. 
Then we consider more precisely the time when X t crosses below its previous 
minimum level and use ([5]) to obtain Theorem [TJ 



G 



We start by examining /. Note that 

f(t) = -1 + npe-* =p(n-t- f(t)) - 1, (15) 

and that f"(t) = —np 2 e~ pt is negative and uniformly bounded by M 2 /n. Since 
/'(0) = np - 1 = £, it follows that if t < en/(2M 2 ), then /'(t) > e/2 and, 
integrating, that 

f(t)>et/2. (16) 

From now on let us pick a function = cj(ti) tending to infinity slowly, in 
particular with lu 6 — o(e 3 n). Set 

(T = \fen 

and 

to = wcr /e, 

ignoring, as usual, the irrelevant rounding to integers. Note for later that to — 
o(en). 

Lemma 6. Let Z = — inf{X t : t < to} denote the number of components 
completely explored by time to, and let Tq = infjt : X t — — Z} be the time 
at which we finish exploring the last such component. Then Z < o-q/uj and 
To < o~o/ieuj) hold whp. 

Considering the initial trajectory of the process (Xt), it is not hard to check 
that in fact Z = O p (e~ 1 ) and Tq = O p (e~ 2 ), but the weaker bounds above 
suffice. 

Proof. Let k = ctq/cj. Note that by choice of u> we have k/y/to —> oo. Let A 
denote the event that sup 4<to \X t — f(t)\ < k/2. Then by Lemma [SJ A holds 
whp. 

At time T we have Xx — —Z. Noting that pt — o(l), we have pto < 1/2 
if n is large enough, which we assume from now on. Since To < to by definition, 
it follows that pTo < 1/2. But then Lemma [3] gives 

\X To - X Ta \ < pT C To <Z/2, 

and thus X To < -Z/2. Since f(t) > for t < t < pn, this gives \X To -f(T )\ > 
Z/2. Hence, whenever A holds, we have Z < k, and the first statement follows. 

Turning to second statement, recall from (fTH)|) that f(t) > et/2 for t < 
t a = o(en). Consider the interval I = [o~o/(eu>),to]. In this interval we have 
f(t) > cr /(2a;) = k/2, so if A holds then X t > for all tel. As shown above, 
we have Xj> < —Z/2 < 0, so whenever A holds then T ^ /. Since T < t by 
definition, this completes the proof. □ 

Let T\ = inf{t : X t = — Z — 1}. Then by the properties of the exploration 
process, there is a component with Tj — To vertices; we aim to show that this 
component has size close to the anticipated size of the giant component. 
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Since np = 0(1), by Lemmas [3] and |5] we have that 

sup \X t -X t \ <a /yfa (17) 

t<Ti 

holds whp. 

Let t\ — pn, noting that t\ ~ 2en if e — > 0, and that t\ is the unique 
positive solution to f(t) = 0. Let = ti — to and tf = ti + to- Note that 
£jf" = O(en) = O(po). From (jTTJ) and Lemma [5] we have that 

sup |*t-/(t)|<Vwa-o (18) 

t<min{Ti,t+} 

holds whp. 

Let a = —f'(ti), so from (jT5)) and the definition of t\ we have 

a = -f'(t x ) = 1 - p(n - *i) = 1 — A(l — p) = 1 — A*, 

where A* is the dual branching process parameter to A. In particular, a = 0(e). 
Since /(ti) = and /"(t) is uniformly 0(1 /n), recalling that to = o(en) it 
follows easily that f(ti) and f(tf) are both of order eto = w<to- To be concrete, 
if n is large enough, then we certainly have 

/(tf) > IOV^cto and /(tf ) < -IOVwo-q, 

say. Since /(to) > £to/2 > lO-^/aJoo and / is unimodal, we have inf to<t<t - /(t) > 
lO-^/wcro- Let denote the event described in (|T5)) . Then, whenever £? holds, 
we have X t > for to < t < min{Ti, t^}. Since Xt x < — — 1 < 0, this implies 
Ti>t-. 

Recall from Lemma|H]that (crudely) Z < <tq whp. Suppose Z < ao, B holds, 
and Ti > tf. Then from B and the bound on f(t±) we have X t + < < 

—Z, contradicting T\ > tf. It follows that T\ < t± holds whp. 

At this point we have shown that \T\ — t%\ < to holds whp, which gives 
\T\ — To — ii| < 2to- Since uj may tend to infinity arbitrarily slowly, this already 
shows that T\ — T = ti + O p (a /e) = pn + O p (vr%) • To go further, we next 
analyze the distribution of X tl more precisely. 

From Lemma [6] and the bound T\ > t^ whp just proved, whp we have 
C t - = Z < <Jo/u. Noting that to = t\ — t\ = o(n), it follows that ¥.Ct 1 = o(n). 
Lemma[3jand Lemma[5]thus give \X t — f(t)\ = o p (n), uniformly in t < t\. Since 
Xt — fit) is deterministically bounded by n, it follows that K\X t — f(t)\ and hence 
M\Xt + Ct+i — f(t)\ are o(n), uniformly in t < t\. Let u t = n — t— f(t) = ne~ pt . 
Since XJ' t =n — t— (X t + C t +i), we have shown that 

ti-i 

IE \U' t — ut\ = o(tin) = o(en 2 ). (19) 
t=o 
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Note that 



ti-i ti-i 



p(l -p)^2{l-p) 2t u t ~p^e 



2pL 



ne 




n 2 p / e Xx dx = n\\- 1 {e Xp 




l) = np/(l-p), (20) 



using e Ap = 1 — p in the last step. 

Lemma 7. TTie distribution of S* fl is asymptotically normal with mean and 
variance np/(l — p). 

Proof. Recall that (St) is a martingale with So = 0, and that the conditional 
distribution of the ith difference (1— p)~ l Ai is (1— p)~ l times a centered binomial 
CBi({/ l '_ 1 ,p), and has conditional variance given by (fT5)l . The result follows 
easily by a standard martingale central limit theorem such Brown [2j Theorem 
2]. Note that here the differences are not uniformly bounded. However, we can 
write Aj as the sum of a random number U' i _ l of CBi(l,p) random variables, 
plus n — U-_ 1 zero variables. We can take the new variables multiplied by 
(1— p)~ % as the differences of a martingale (S'j) with the property that St = S' nt . 
In this way we obtain a martingale with the same (random) final value in which 
the differences are bounded by (1 — p)~ n — 0(1). The (random) sum of the 
(old or new) conditional variances is exactly s = Y^t=o (•"■ — P) 2t ^t-iP(^ ~ 
By (fT9]l and (|20| the ratio of s to np/(l — p) converges to 1 in probability, as 
required for the martingale central limit theorem. □ 

To relate the distribution of T\ to that of X tl (or X tl ) we use the fact 
that (X t ) has slope approximately —a near t\\ a similar argument was given by 
Martin-L6f [5]. 

Lemma 8. We have 



x t -x tl + Q--p) t St-Q--p)* 1 S tl = (f(t)-f(t 1 )) + (l- P ) t S t -(l-p)^S tl +0(l). 



Recalling that f'{ti) = —a and f"{t) = 0(1/ n) uniformly in t, the difference 
between the first term and a(t\ — t) is 0(\t — ti\ 2 /n) — 0(tl/n) — o(cto). For 
the rest, note that 



Since St 1 — O p (y/ti) and ptQ^/tl — 0(n 1 luo-$£ \/en) = o(ao), it thus suffices 
to show that supu_ tl | <to \St — St ± \ — o p (ao). But this follows easily by applying 



sup \Xt - X tl - a(t% - t)\ = o p ((7o). 

\t-H\<t 



Proof. From (fTTj) we may write Xt — Xt 1 as 



Ki - P f - (i - P r i < ii - a - p) 1 '-' 1 ' i < P \t - <ii < pt,,. 



t+ 

Lemma HI to the martingale (St — 5' 4 -) t I _ t _, which has final variance O(to) 




□ 
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Proof of Theorem^ Recall from Lemma [SJ that Z, the number of components 
explored by time to, satisfies Z = o p (ao). We have shown above that whp 
Ti = inf{£ : X t = —Z - 1} lies between 47/ and tf. From JT7J), X t is within 
Op(co) of X t at least until T\. It follows that at time T\, we have X t = o p (cro)- 
Since a = Q(e), Lemma [5] thus gives 

T 1 =t 1 +X tl /a + o p (a /e). (21) 

From Lemma [71 (|11|) and the fact that /(ii) = 0, we have that X tl is asymp- 
totically normal with mean and variance 

(1 - p) 2tl np/(l - p) ~ e- 2A "np/(l - p) = np(l - p). 

Hence X tl /a is asymptotically normal with mean and variance 

np(l — p)/a 2 = o~ . 

Since this variance is of order = £~ 2 ctq, the o p (ao/e) error term in (|2ip 

is irrelevant, and T\ is asymptotically normal with mean t\ = pn and variance 
a 2 . Finally, from Lemma [B] we have To = o p (<to/e). It follows that T\ — Tq is 
asymptotically normal with the parameters claimed in the theorem. 

This shows the existence of a component with the claimed size. As shown by 
Nachmias and Peres [B] , it is easy to check that the rest of the graph corresponds 
to a subcritical random graph, and whp will not contain a larger component. □ 

Acknowledgement. We are grateful to an anonymous referee for several sug- 
gestions improving the presentation of the paper. 
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